We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x > k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.
translated by 谷歌翻译
In recent years, image and video delivery systems have begun integrating deep learning super-resolution (SR) approaches, leveraging their unprecedented visual enhancement capabilities while reducing reliance on networking conditions. Nevertheless, deploying these solutions on mobile devices still remains an active challenge as SR models are excessively demanding with respect to workload and memory footprint. Despite recent progress on on-device SR frameworks, existing systems either penalize visual quality, lead to excessive energy consumption or make inefficient use of the available resources. This work presents NAWQ-SR, a novel framework for the efficient on-device execution of SR models. Through a novel hybrid-precision quantization technique and a runtime neural image codec, NAWQ-SR exploits the multi-precision capabilities of modern mobile NPUs in order to minimize latency, while meeting user-specified quality constraints. Moreover, NAWQ-SR selectively adapts the arithmetic precision at run time to equip the SR DNN's layers with wider representational power, improving visual quality beyond what was previously possible on NPUs. Altogether, NAWQ-SR achieves an average speedup of 7.9x, 3x and 1.91x over the state-of-the-art on-device SR systems that use heterogeneous processors (MobiSR), CPU (SplitSR) and NPU (XLSR), respectively. Furthermore, NAWQ-SR delivers an average of 3.2x speedup and 0.39 dB higher PSNR over status-quo INT8 NPU designs, but most importantly mitigates the negative effects of quantization on visual quality, setting a new state-of-the-art in the attainable quality of NPU-based SR.
translated by 谷歌翻译
Recent image degradation estimation methods have enabled single-image super-resolution (SR) approaches to better upsample real-world images. Among these methods, explicit kernel estimation approaches have demonstrated unprecedented performance at handling unknown degradations. Nonetheless, a number of limitations constrain their efficacy when used by downstream SR models. Specifically, this family of methods yields i) excessive inference time due to long per-image adaptation times and ii) inferior image fidelity due to kernel mismatch. In this work, we introduce a learning-to-learn approach that meta-learns from the information contained in a distribution of images, thereby enabling significantly faster adaptation to new images with substantially improved performance in both kernel estimation and image fidelity. Specifically, we meta-train a kernel-generating GAN, named MetaKernelGAN, on a range of tasks, such that when a new image is presented, the generator starts from an informed kernel estimate and the discriminator starts with a strong capability to distinguish between patch distributions. Compared with state-of-the-art methods, our experiments show that MetaKernelGAN better estimates the magnitude and covariance of the kernel, leading to state-of-the-art blind SR results within a similar computational regime when combined with a non-blind SR model. Through supervised learning of an unsupervised learner, our method maintains the generalizability of the unsupervised learner, improves the optimization stability of kernel estimation, and hence image adaptation, and leads to a faster inference with a speedup between 14.24 to 102.1x over existing methods.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
Recently, numerous studies have investigated cooperative traffic systems using the communication among vehicle-to-everything (V2X). Unfortunately, when multiple autonomous vehicles are deployed while exposed to communication failure, there might be a conflict of ideal conditions between various autonomous vehicles leading to adversarial situation on the roads. In South Korea, virtual and real-world urban autonomous multi-vehicle races were held in March and November of 2021, respectively. During the competition, multiple vehicles were involved simultaneously, which required maneuvers such as overtaking low-speed vehicles, negotiating intersections, and obeying traffic laws. In this study, we introduce a fully autonomous driving software stack to deploy a competitive driving model, which enabled us to win the urban autonomous multi-vehicle races. We evaluate module-based systems such as navigation, perception, and planning in real and virtual environments. Additionally, an analysis of traffic is performed after collecting multiple vehicle position data over communication to gain additional insight into a multi-agent autonomous driving scenario. Finally, we propose a method for analyzing traffic in order to compare the spatial distribution of multiple autonomous vehicles. We study the similarity distribution between each team's driving log data to determine the impact of competitive autonomous driving on the traffic environment.
translated by 谷歌翻译
基于注意力的神经网络在许多AI任务中都普遍存在。尽管其出色的算法性能,但注意力机制和前馈网络(FFN)的使用仍需要过多的计算和内存资源,这通常会损害其硬件性能。尽管已经引入了各种稀疏变体,但大多数方法仅着重于缓解算法级别上的二次注意力缩放,而无需明确考虑将其方法映射到真实硬件设计上的效率。此外,大多数努力仅专注于注意机制或FFN,但没有共同优化这两个部分,导致当前的大多数设计在处理不同的输入长度时缺乏可扩展性。本文从硬件角度系统地考虑了不同变体中的稀疏模式。在算法级别上,我们提出了Fabnet,这是一种适合硬件的变体,它采用统一的蝴蝶稀疏模式来近似关注机制和FFN。在硬件级别上,提出了一种新颖的适应性蝴蝶加速器,可以在运行时通过专用硬件控件配置,以使用单个统一的硬件引擎加速不同的蝴蝶层。在远程 - ARENA数据集上,FabNet达到了与香草变压器相同的精度,同时将计算量减少10到66次,参数数量为2至22次。通过共同优化算法和硬件,我们的基于FPGA的蝴蝶加速器在归一化到同一计算预算的最新加速器上达到了14.2至23.2倍的速度。与Raspberry Pi 4和Jetson Nano上优化的CPU和GPU设计相比,我们的系统在相同的功率预算下的最大273.8和15.1倍。
translated by 谷歌翻译
我们挑战AI模型,以“展示”对《纽约客》标题比赛的复杂多模式幽默的理解。具体而言,我们开发了三个精心限制的任务,以掌握图像和标题之间的潜在复杂和意外的关系,并且对人类经验的广泛品种产生了复杂和意外的寓意;这些是纽约口径卡通的标志。我们调查了直接将卡通像素和字幕输入的视觉和语言模型,以及仅通过提供图像的文本描述来规避图像处理的仅限语言模型。即使我们为卡通图像提供了丰富的多方面注释,我们也可以确定高质量的机器学习模型(例如,微调,175b参数语言模型)和人类之间的性能差距。我们公开发布我们的语料库,包括描述图像的位置/实体的注释,场景的不寻常以及对笑话的解释。
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
学习在无人驾驶汽车(UAV)捕获的图像中检测物体(例如人类)通常会遭受无人机对物体的位置造成的巨大变化。此外,现有的基于无人机的基准数据集不提供足够的数据集元数据,这对于精确的模型诊断至关重要,并且学习功能不变。在本文中,我们介绍了大天使,这是第一个基于无人机的对象检测数据集,该数据集由具有相似想象条件以及无人机位置以及对象姿势元数据捕获的真实和合成子集组成。一系列实验经过精心设计,使用最先进的对象检测器设计,以证明在模型评估过程中利用元数据的好处。此外,还提供了几种涉及模型微调过程中涉及真实和合成数据的关键见解。最后,我们讨论了有关大天使的优势,局限性和未来方向,以突出其对更广泛的机器学习社区的独特价值。
translated by 谷歌翻译
脑小血管疾病的成像标记提供了有关脑部健康的宝贵信息,但是它们的手动评估既耗时又受到实质性内部和间际变异性的阻碍。自动化评级可能受益于生物医学研究以及临床评估,但是现有算法的诊断可靠性尚不清楚。在这里,我们介绍了\ textIt {血管病变检测和分割}(\ textit {v textit {where valdo?})挑战,该挑战是在国际医学图像计算和计算机辅助干预措施(MICCAI)的卫星事件中运行的挑战(MICCAI) 2021.这一挑战旨在促进大脑小血管疾病的小而稀疏成像标记的自动检测和分割方法的开发,即周围空间扩大(EPVS)(任务1),脑微粒(任务2)和预先塑造的鞋类血管起源(任务3),同时利用弱和嘈杂的标签。总体而言,有12个团队参与了针对一个或多个任务的解决方案的挑战(任务1 -EPVS 4,任务2 -Microbleeds的9个,任务3 -lacunes的6个)。多方数据都用于培训和评估。结果表明,整个团队和跨任务的性能都有很大的差异,对于任务1- EPV和任务2-微型微型且对任务3 -lacunes尚无实际的结果,其结果尤其有望。它还强调了可能阻止个人级别使用的情况的性能不一致,同时仍证明在人群层面上有用。
translated by 谷歌翻译